Skip to main content

Web Collection 2012

Web crawl data from the year 2012. Some of this data is currently not publicly accessible.

192,582
RESULTS
rss


PART OF
Web Collections
Media Type
192,582
web
Topics & Subjects
152,948
crawldata
3,505
wiki
3,497
dumps
3,494
incremental
3,494
media
3,494
tape
More right-solid
Collection
192,161
Web Crawls
134,404
Internet Archive Web Crawls
95,684
Worldwide Web Crawls
39,252
Wide Crawl started April 2012
32,297
Youtube Videos
30,362
Wide Crawl started January 2012
More right-solid
Creator
140,980
internet archive
21,568
archive-it
11,966
thumper2.php
3,515
wikimedia projects editors
273
portuguese web archive
17
www.engadget.com
More right-solid
Language
3,037
English
275
Portuguese
26
Italian
19
Anyhub
13
Swedish
9
German
More right-solid
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Wide Crawl started January 2012
web
eye 4.2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Tue Jan 17 08:02:53 PST 2012 to Tue Jan 17 01:16:20 PST 2012.
Topic: crawldata
vkontakte.ru
web
eye 2M
favorite 0
comment 0
Source: vkontakte.ru
Wide Crawl started January 2012
web
eye 1.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sat Jan 21 04:01:50 PST 2012 to Fri Jan 20 21:01:34 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 962,428
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-20T21:59:15 UTC to 2012-07-21T06:44:52 UTC.
Topic: crawldata
Youtube Videos
web
eye 875,297
favorite 0
comment 0
Internet Archive crawldata from YouTube Video archiving project 2011, captured by crawl440.us.archive.org:youtube from Sat Jul 21 05:39:04 PDT 2012 to Fri Jul 20 23:46:43 PDT 2012.
Topic: crawldata
recurrence=WEEKLY, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=3, accountId=575, organizationName="Chicago-Kent College of Law", collectionId=2817, collectionName="Chicago-Kent College of Law"
recurrence=NONE, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=12, accountId=593, organizationName="Government Printing Office", collectionId=3142, collectionName="CFPB"
Japan Earthquake
web
eye 834,010
favorite 0
comment 0
recurrence=NONE, maxDuration=259200, maxDocumentCount=null, isTestCrawl=false, seedCount=507, accountId=156, organizationName="Virginia Tech: Crisis, Tragedy, and Recovery Network", collectionId=2438, collectionName="Japan Earthquake"
Live Web Proxy Crawls
web
eye 804,047
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 12:37:06 PST 2012 to Tue Jan 3 06:31:55 PST 2012.
Topic: crawldata
The Archive Team Just In Time Grabs
web
eye 783,235
favorite 0
comment 0
Election Crawl 2012
web
eye 607,614
favorite 0
comment 0
Internet Archive crawldata uploaded by crawling119.us.archive.org:COL-ELECTION2012 from Fri Jun 29 21:24:09 PDT 2012 to Thu Jan 10 20:37:50 PST 2013.
Topic: crawldata
Wikipedia Outlinks February 2012
web
eye 601,396
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Thu Mar 1 20:56:37 PST 2012 to Thu Mar 1 14:19:48 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 541,369
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sat May 12 21:38:20 PDT 2012 to Sat May 12 18:43:53 PDT 2012.
Topic: crawldata
Wide Crawl started September 2012
web
eye 528,668
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl337.us.archive.org:wide from Wed Oct 17 08:14:47 PDT 2012 to Wed Oct 17 02:41:59 PDT 2012.
Topic: crawldata
Alexa Crawls
by thumper2.php
web
eye 480,823
favorite 0
comment 0
Alexa crawl
Topic: crawldata
Wikipedia Outlinks February 2012
web
eye 455,631
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Thu Oct 18 19:09:44 PDT 2012 to Thu Oct 18 13:30:19 PDT 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 439,280
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Wed Jan 11 15:00:58 PST 2012 to Wed Jan 11 07:47:03 PST 2012.
Topic: crawldata
Wide Crawl started September 2012
web
eye 434,523
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Fri Oct 19 01:17:49 PDT 2012 to Thu Oct 18 21:38:24 PDT 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 432,103
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 18:18:33 PST 2012 to Thu Jan 5 11:30:47 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 419,467
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 20:28:12 PST 2012 to Thu Jan 5 13:25:29 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 409,751
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Wed Mar 7 23:01:46 PST 2012 to Wed Mar 7 17:53:26 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 397,774
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 11:14:32 PST 2012 to Tue Jan 3 04:36:49 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 395,246
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Jan 3 14:35:17 PST 2012 to Tue Jan 3 08:02:37 PST 2012.
Topic: crawldata
Archive Team
web
eye 388,173
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
Live Web Proxy Crawls
web
eye 384,479
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-11-22T06:44:42 UTC to 2012-11-22T18:14:08 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 383,885
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 16:52:35 PST 2012 to Thu Jan 5 10:17:00 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 368,326
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Tue Apr 17 00:58:46 PDT 2012 to Mon Apr 16 20:37:31 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 367,969
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-13T10:53:21 UTC to 2012-12-13T16:53:32 UTC.
Topic: crawldata
Wide Crawl started September 2012
web
eye 361,242
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl410.us.archive.org:wide from Wed Nov 7 02:28:32 PST 2012 to Tue Nov 6 20:50:40 PST 2012.
Topic: crawldata
Archive Team
web
eye 358,562
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
NLS_2012
web
eye 351,071
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Sat Jun 16 16:29:25 PDT 2012 to Wed Dec 12 11:44:11 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 350,134
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-06-26T02:05:01 UTC to 2012-06-26T10:40:25 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 347,463
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 15:53:45 PST 2012 to Thu Jan 5 08:51:54 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 345,686
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-31T05:25:29 UTC to 2013-01-01T02:09:59 UTC.
Topic: crawldata
google.co.jp
web
eye 345,539
favorite 0
comment 0
Source: google.co.jp
NLS_2012
web
eye 340,142
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Wed Oct 31 04:28:48 PDT 2012 to Wed Dec 12 01:48:33 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 335,371
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-14T21:29:12 UTC to 2013-01-23T21:42:32 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 333,524
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-30T15:07:01 UTC to 2012-12-31T14:15:26 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 316,276
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Mon Mar 5 18:11:24 PST 2012 to Mon Mar 5 12:34:16 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 309,716
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-18T16:45:10 UTC to 2012-12-19T09:20:10 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 306,886
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-14T17:34:12 UTC to 2012-12-15T10:27:19 UTC.
Topic: crawldata
Wide Crawl started April 2012
web
eye 302,954
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Fri Jun 8 07:28:07 PDT 2012 to Fri Jun 8 02:31:40 PDT 2012.
Topic: crawldata
Archive Team
web
eye 302,596
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
The Archive Team Just In Time Grabs
web
eye 299,063
favorite 1
comment 0
Live Web Proxy Crawls
web
eye 294,891
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-11-22T04:31:46 UTC to 2012-11-22T19:27:11 UTC.
Topic: crawldata
Wide Crawl started January 2012
web
eye 292,702
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 19:32:22 PST 2012 to Thu Jan 5 12:20:54 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 290,456
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl336.us.archive.org:wide from Wed Jan 18 17:01:06 PST 2012 to Wed Jan 18 09:17:47 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 289,104
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Tue Feb 28 04:16:59 PST 2012 to Tue Feb 28 03:44:00 PST 2012.
Topic: crawldata
Wide Crawl started April 2012
web
eye 289,000
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun May 13 07:09:39 PDT 2012 to Sun May 13 01:49:20 PDT 2012.
Topic: crawldata
Archive Team
web
eye 287,581
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
Wikipedia Outlinks February 2012
web
eye 285,467
favorite 0
comment 0
Internet Archive crawldata from wikipedia outbound links. captured by crawl435.us.archive.org:wpo from Sat Feb 11 14:58:31 PST 2012 to Sat Feb 11 08:33:22 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 285,356
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-30T19:10:53 UTC to 2012-12-31T10:26:34 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 284,534
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sun Jan 1 09:53:25 PST 2012 to Sun Jan 1 03:42:17 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 283,461
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-30T04:16:42 UTC to 2012-07-30T11:28:26 UTC.
Topic: crawldata
ask.com
web
eye 283,431
favorite 0
comment 0
Source: ask.com
Live Web Proxy Crawls
web
eye 282,317
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sun Feb 12 06:50:14 PST 2012 to Sun Feb 12 02:22:07 PST 2012.
Topic: crawldata
reddit.com
web
eye 281,484
favorite 0
comment 0
Source: reddit.com
reddit.com
web
eye 280,045
favorite 0
comment 0
Source: reddit.com
Wide Crawl started April 2012
web
eye 279,017
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu May 3 12:36:06 PDT 2012 to Thu May 3 07:30:25 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 278,863
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-27T00:05:42 UTC to 2012-12-27T16:35:17 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 274,864
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-10-30T19:58:33 UTC to 2012-10-31T18:47:34 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 274,092
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-28T19:46:00 UTC to 2012-12-29T15:05:14 UTC.
Topic: crawldata
Archive Team
web
eye 272,937
favorite 0
comment 0
This item contains regular captures of Dutch news websites in screenshot and WARC format. Dit item bevat de homepages van Nederlandse nieuwswebsites als screenshot en in WARC-formaat. Websites: nos.nl teletekst.nos.nl rtlnieuws.nl nu.nl telegraaf.nl metronieuws.nl spitsnieuws.nl volkskrant.nl nrc.nl trouw.nl parool.nl fd.nl refdag.nl
ameblo.jp
web
eye 272,833
favorite 0
comment 0
Source: ameblo.jp
Live Web Proxy Crawls
web
eye 271,896
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-07T06:06:22 UTC to 2012-12-10T21:54:23 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 271,767
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-10-15T10:26:27 UTC to 2012-10-15T18:41:53 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 271,495
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-gen1.us.archive.org from 2012-07-27T10:56:04 UTC to 2012-07-27T17:15:08 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 268,408
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-20T18:25:28 UTC to 2012-12-21T16:41:39 UTC.
Topic: crawldata
NLS_2012
web
eye 266,703
favorite 0
comment 0
Internet Archive crawldata uploaded by selenium-101.us.archive.org:NLS-CRAWL-004 from Sun Jul 8 09:00:15 PDT 2012 to Wed Dec 12 01:48:28 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 265,595
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBackMachine, captured by wwwb-gen1.us.archive.org:wbm from Sat Mar 24 09:36:36 PDT 2012 to Sat Mar 24 08:42:19 PDT 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 264,959
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-14T22:44:21 UTC to 2012-12-15T05:16:24 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 262,771
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-12-27T07:49:06 UTC to 2012-12-28T07:29:11 UTC.
Topic: crawldata
yahoo.co.jp
web
eye 262,281
favorite 0
comment 0
Source: yahoo.co.jp
Wide Crawl started September 2012
web
eye 262,057
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Wed Nov 14 17:39:11 PST 2012 to Wed Nov 14 10:55:11 PST 2012.
Topic: crawldata
Live Web Proxy Crawls
web
eye 261,962
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-12-28T22:26:32 UTC to 2012-12-29T15:19:50 UTC.
Topic: crawldata